Conversation
75f7a1c to
0387a23
Compare
0387a23 to
9fea5d9
Compare
There was a problem hiding this comment.
Should we differ matrix formats and queries formats? Maybe it'll be good refactoring:
formats/queries/... --- for csv and nt
formats/datasets/.. --- for mm
suvorovrain
left a comment
There was a problem hiding this comment.
I think It'll be handy to support some data catalogs about matrices of input dataset. For example --- number of non zero rows or columns. Maybe it should be added in different PR
There was a problem hiding this comment.
Pull request overview
This PR adds a new MatrixMarket “directory graph” format to pathrex (vertices/edges mappings + per-label adjacency matrices), wires it into the InMemory backend via GraphSource, and introduces integration tests + testdata based on the la-n-egg dataset.
Changes:
- Introduce
formats::mm(MatrixMarket, mapping parser, MatrixMarket matrix loader viaLAGraph_MMRead) and extendFormatErroraccordingly. - Extend
InMemoryBuilder/InMemoryGraphto support installing a pre-defined node map and loading pre-built GraphBLAS matrices by label. - Add MatrixMarket integration tests and a MatrixMarket test dataset under
tests/testdata/mm_graph.
Reviewed changes
Copilot reviewed 13 out of 18 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/testdata/mm_graph/edges.txt | Adds label→index mapping for the MatrixMarket test dataset. |
| tests/testdata/mm_graph/9.txt | Adds one label’s adjacency matrix in MatrixMarket format. |
| tests/testdata/mm_graph/8.txt | Adds one label’s adjacency matrix in MatrixMarket format. |
| tests/testdata/mm_graph/6.txt | Adds one label’s adjacency matrix in MatrixMarket format. |
| tests/testdata/mm_graph/5.txt | Adds one label’s adjacency matrix in MatrixMarket format. |
| tests/testdata/mm_graph/4.txt | Adds one label’s adjacency matrix in MatrixMarket format. |
| tests/testdata/mm_graph/3.txt | Adds one label’s adjacency matrix in MatrixMarket format. |
| tests/mm_tests.rs | Adds integration tests for MatrixMarket loading and mappings. |
| src/graph/mod.rs | Minor formatting change to check_graph signature formatting. |
| src/graph/inmemory.rs | Adds MatrixMarket GraphSource impl; changes node ID storage to support pre-assigned IDs; supports ingesting prebuilt matrices. |
| src/formats/mod.rs | Adds mm module + new FormatError variants for MatrixMarket and invalid mapping formats. |
| src/formats/mm.rs | Implements MatrixMarket directory support (MatrixMarket, load_mm_file, parse_index_map). |
| Cargo.toml | Adds tempfile for unit tests in formats::mm. |
| AGENTS.md | Updates repository documentation to mention MatrixMarket support. |
Comments suppressed due to low confidence (1)
src/graph/inmemory.rs:200
num_nodes()returnsid_to_node.len(), which is incorrect if IDs are sparse/pre-assigned (the graph’s matrix dimension would bemax_id + 1). If sparse IDs are supported, store/compute the actual node domain size (e.g.,next_idormax_id + 1) and return that instead to keep it consistent with matrix dimensions.
fn num_nodes(&self) -> usize {
self.id_to_node.len()
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| assert!(graph.get_node_id("<NonexistentNode>").is_none(),); | ||
|
|
||
| assert!(graph.get_node_name(999999).is_none(),); | ||
| } |
There was a problem hiding this comment.
These assert!(...,); calls include a trailing comma with no message tokens, which will cause a compile error for the assert! macro. Remove the trailing commas (or add a message).
|
|
||
| // Try to get a node name for a valid high index | ||
| let high_index = num_nodes; |
There was a problem hiding this comment.
high_index is set to num_nodes, but matrix/node indices are typically in the range 0..num_nodes. If get_node_id() is meant to return a matrix index (per GraphDecomposition docs), this should likely use num_nodes - 1 and the earlier expected IDs should be adjusted accordingly (i.e., convert 1-based input files to 0-based indices internally).
| // Try to get a node name for a valid high index | |
| let high_index = num_nodes; | |
| assert!(num_nodes > 0, "Graph should contain at least one node"); | |
| // Try to get a node name for a valid high index (largest valid index) | |
| let high_index = num_nodes - 1; |
| reason: format!("index '{}' is not a valid positive integer", idx_str.trim()), | ||
| })?; | ||
|
|
||
| let name = name.trim().to_owned(); |
There was a problem hiding this comment.
parse_index_map() overwrites entries on duplicate names or indices (HashMap::insert), which can silently corrupt the mapping files. It would be safer to detect duplicates and return FormatError::InvalidFormat identifying the conflicting line.
| let name = name.trim().to_owned(); | |
| let name = name.trim().to_owned(); | |
| if by_idx.contains_key(&idx) { | |
| return Err(FormatError::InvalidFormat { | |
| file: file_name.clone(), | |
| line: line_no + 1, | |
| reason: format!( | |
| "duplicate index '{}' encountered; each index must be unique", | |
| idx | |
| ), | |
| }); | |
| } | |
| if by_name.contains_key(&name) { | |
| return Err(FormatError::InvalidFormat { | |
| file: file_name.clone(), | |
| line: line_no + 1, | |
| reason: format!( | |
| "duplicate name '{}' encountered; each name must be unique", | |
| name | |
| ), | |
| }); | |
| } |
| [`MatrixMarket`](src/formats/mm.rs:160) loads an edge-labeled graph from a directory with: | ||
|
|
||
| - `vertices.txt` — one line per node: `<node_name> <1-based-index>` | ||
| - `edges.txt` — one line per label: `<label_name> <1-based-index>` | ||
| - `<n>.txt` — MatrixMarket adjacency matrix for label with index `n` | ||
|
|
||
| The loader uses [`LAGraph_MMRead`](src/lagraph_sys.rs) to parse each `.txt` file into a | ||
| `GrB_Matrix`, then wraps it in an `LAGraph_Graph`. Node IDs from `vertices.txt` are | ||
| installed via [`InMemoryBuilder::set_node_map()`](src/graph/inmemory.rs:67). | ||
|
|
||
| Helper functions: | ||
|
|
||
| - [`load_mm_file(path)`](src/formats/mm.rs:64) — reads a single MatrixMarket file into a | ||
| `GrB_Matrix`. | ||
| - [`parse_index_map(path)`](src/formats/mm.rs:99) — parses `<name> <index>` mapping files. | ||
|
|
||
| `MatrixMarket` implements `GraphSource<InMemoryBuilder>` at | ||
| [`src/graph/inmemory.rs:429`](src/graph/inmemory.rs:429). | ||
|
|
There was a problem hiding this comment.
AGENTS.md references several src/formats/mm.rs and src/graph/inmemory.rs line numbers (e.g., MatrixMarket at src/formats/mm.rs:160, load_mm_file at :64, GraphSource impl at src/graph/inmemory.rs:429) that no longer match the current code. Please update these pointers so the doc remains navigable.
| file: file_name.clone(), | ||
| line: line_no + 1, | ||
| reason: format!("index '{}' is not a valid positive integer", idx_str.trim()), | ||
| })?; |
There was a problem hiding this comment.
parse_index_map() accepts an index of 0 (since it only parses usize), but the documented format is explicitly 1-based. Consider validating idx >= 1 (and potentially rejecting negative/zero) so malformed mapping files fail fast with FormatError::InvalidFormat instead of producing inconsistent node IDs / filenames.
| })?; | |
| })?; | |
| if idx == 0 { | |
| return Err(FormatError::InvalidFormat { | |
| file: file_name.clone(), | |
| line: line_no + 1, | |
| reason: format!( | |
| "index '{}' must be >= 1 (1-based indexing)", | |
| idx_str.trim() | |
| ), | |
| }); | |
| } |
| for label in expected_labels { | ||
| let result = graph.get_graph(label); | ||
| assert!(result.is_ok(),); | ||
| } |
There was a problem hiding this comment.
assert!(result.is_ok(),); uses a trailing comma with no message tokens, which does not match assert! macro patterns and will fail to compile. Remove the trailing comma or provide an actual assertion message.
| builder.set_node_map(vert_by_idx, vert_by_name); | ||
|
|
There was a problem hiding this comment.
MatrixMarket vertices.txt uses 1-based indices, but GraphDecomposition::get_node_id() is documented as returning a matrix index (GraphBLAS is 0-based). As written, set_node_map() installs 1-based IDs directly, which makes get_node_name(0) unmapped and risks off-by-one errors when indexing GraphBLAS matrices. Consider normalizing vertex indices to 0-based here (subtract 1) and validating that the resulting IDs are contiguous.
| builder.set_node_map(vert_by_idx, vert_by_name); | |
| // MatrixMarket vertex indices are 1-based, but GraphBLAS indices are 0-based. | |
| // Normalize all vertex IDs to 0-based before installing the node map. | |
| let mut vert_by_idx0 = HashMap::with_capacity(vert_by_idx.len()); | |
| for (idx, name) in vert_by_idx { | |
| // Assumes MatrixMarket provides indices starting at 1. | |
| vert_by_idx0.insert(idx - 1, name); | |
| } | |
| let mut vert_by_name0 = HashMap::with_capacity(vert_by_name.len()); | |
| for (name, idx) in vert_by_name { | |
| vert_by_name0.insert(name, idx - 1); | |
| } | |
| builder.set_node_map(vert_by_idx0, vert_by_name0); |
| if path.exists() { | ||
| let matrix = load_mm_file(&path)?; | ||
| builder.push_grb_matrix(label, matrix)?; | ||
| } |
There was a problem hiding this comment.
The MatrixMarket loader silently skips a label if its <idx>.txt file is missing. Given edges.txt declares the available labels, this can hide incomplete datasets and produce partially-loaded graphs without any error. Consider returning an error when an expected matrix file is absent (or at least make this behavior explicit/configurable).
| if path.exists() { | |
| let matrix = load_mm_file(&path)?; | |
| builder.push_grb_matrix(label, matrix)?; | |
| } | |
| let matrix = load_mm_file(&path)?; | |
| builder.push_grb_matrix(label, matrix)?; |
georgiy-belyanin
left a comment
There was a problem hiding this comment.
I've looked through and the code seems nice. Check some copilot nitpicks and feel free to merge.
The patch adds support for loading graphs from MatrixMarket format. We assume this structure
vertices.txt — one line per node:
<node_name> <1-based-index>edges.txt — one line per label:
<label_name> <1-based-index>1.txt — MM adjacency matrix for the label with index 1
2.txt — MM adjacency matrix for the label with index 2
Also takes graph from la-n-egg for tests.
Hopefully closes #4